13 research outputs found

    Help, It Looks Confusing: GUI Task Automation Through Demonstration and Follow-up Questions

    Get PDF
    Non-programming users should be able to create their own customized scripts to perform computer-based tasks for them, just by demonstrating to the machine how it's done. To that end, we develop a system prototype which learns-by-demonstration called HILC (Help, It Looks Confusing). Users train HILC to synthesize a task script by demonstrating the task, which produces the needed screenshots and their corresponding mouse-keyboard signals. After the demonstration, the user answers follow-up questions. We propose a user-in-the-loop framework that learns to generate scripts of actions performed on visible elements of graphical applications. While pure programming-by-demonstration is still unrealistic, we use quantitative and qualitative experiments to show that non-programming users are willing and effective at answering follow-up queries posed by our system. Our models of events and appearance are surprisingly simple, but are combined effectively to cope with varying amounts of supervision. The best available baseline, Sikuli Slides, struggled with the majority of the tests in our user study experiments. The prototype with our proposed approach successfully helped users accomplish simple linear tasks, complicated tasks (monitoring, looping, and mixed), and tasks that span across multiple executables. Even when both systems could ultimately perform a task, ours was trained and refined by the user in less time

    Harmonic Networks: Deep Translation and Rotation Equivariance

    Get PDF
    Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN exhibiting equivariance to patch-wise translation and 360-rotation. We achieve this by replacing regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch. H-Nets use a rich, parameter-efficient and low computational complexity representation, and we show that deep feature maps within the network encode complicated rotational invariants. We demonstrate that our layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. We also achieve state-of-the-art classification on rotated-MNIST, and competitive results on other benchmark challenges

    Synthesizing and Editing Photo-realistic Visual Objects

    Get PDF
    In this thesis we investigate novel methods of synthesizing new images of a deformable visual object using a collection of images of the object. We investigate both parametric and non-parametric methods as well as a combination of the two methods for the problem of image synthesis. Our main focus are complex visual objects, specifically deformable objects and objects with varying numbers of visible parts. We first introduce sketch-driven image synthesis system, which allows the user to draw ellipses and outlines in order to sketch a rough shape of animals as a constraint to the synthesized image. This system interactively provides feedback in the form of ellipse and contour suggestions to the partial sketch of the user. The user's sketch guides the non-parametric synthesis algorithm that blends patches from two exemplar images in a coarse-to-fine fashion to create a final image. We evaluate the method and synthesized images through two user studies. Instead of non-parametric blending of patches, a parametric model of the appearance is more desirable as its appearance representation is shared between all images of the dataset. Hence, we propose Context-Conditioned Component Analysis, a probabilistic generative parametric model, which described images with a linear combination of basis functions. The basis functions are evaluated for each pixel using a context vector computed from the local shape information. We evaluate C-CCA qualitatively and quantitatively on inpainting, appearance transfer and reconstruction tasks. Drawing samples of C-CCA generates novel, globally-coherent images, which, unfortunately, lack high-frequency details due to dimensionality reduction and misalignment. We develop a non-parametric model that enhances the samples of C-CCA with locally-coherent, high-frequency details. The non-parametric model efficiently finds patches from the dataset that match the C-CCA sample and blends the patches together. We analyze the results of the combined method on the datasets of horse and elephant images

    Two-View Geometry Scoring Without Correspondences

    Get PDF
    Camera pose estimation for two-view geometry traditionally relies on RANSAC. Normally, a multitude of image correspondences leads to a pool of proposed hypotheses, which are then scored to find a winning model. The inlier count is generally regarded as a reliable indicator of 'consensus'. We examine this scoring heuristic, and find that it favors disappointing models under certain circumstances. As a remedy, we propose the Fundamental Scoring Network (FSNet), which infers a score for a pair of overlap-ping images and any proposed fundamental matrix. It does not rely on sparse correspondences, but rather embodies a two-view geometry model through an epipolar attention mechanism that predicts the pose error of the two images. FSNet can be incorporated into traditional RANSAC loops. We evaluate FSNet onfundamental and essential matrix estimation on indoor and outdoor datasets, and establish that FSNet can successfully identify good poses for pairs of images with few or unreliable correspondences. Besides, we show that naively combining FSNet with MAGSAC++ scoring approach achieves state of the art results

    HILC: Domain-Independent PbD System Via Computer Vision and Follow-Up Questions

    Get PDF
    Creating automation scripts for tasks involving Graphical User Interface (GUI) interactions is hard. It is challenging because not all software applications allow access to a program’s internal state, nor do they all have accessibility APIs. Although much of the internal state is exposed to the user through the GUI, it is hard to programmatically operate the GUI’s widgets. To that end, we developed a system prototype that learns by demonstration, called HILC (Help, It Looks Confusing). Users, both programmers and non-programmers, train HILC to synthesize a task script by demonstrating the task. A demonstration produces the needed screenshots and their corresponding mouse-keyboard signals. After the demonstration, the user answers follow-up questions. We propose a user-in-the-loop framework that learns to generate scripts of actions performed on visible elements of graphical applications. Although pure programming by demonstration is still unrealistic due to a computer’s limited understanding of user intentions, we use quantitative and qualitative experiments to show that non-programming users are willing and effective at answering follow-up queries posed by our system, to help with confusing parts of the demonstrations. Our models of events and appearances are surprisingly simple but are combined effectively to cope with varying amounts of supervision. The best available baseline, Sikuli Slides, struggled to assist users in the majority of the tests in our user study experiments. The prototype with our proposed approach successfully helped users accomplish simple linear tasks, complicated tasks (monitoring, looping, and mixed), and tasks that span across multiple applications. Even when both systems could ultimately perform a task, ours was trained and refined by the user in less time

    Interpretable transformations with Encoder-Decoder Networks

    Get PDF
    Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed using a transforming encoder-decoder network with a custom feature transform layer, acting on the hidden representations. We demonstrate the advantages of explicit disentangling on a variety of datasets and transformations, and as an aid for traditional tasks, such as classification

    Interactive Sketch-Driven Image Synthesis

    Get PDF
    We present an interactive system for composing realistic images of an object under arbitrary pose and appearance specified by sketching. Our system draws inspiration from a traditional illustration workflow: The user first sketches rough 'masses' of the object, as ellipses, to define an initial abstract pose that can then be refined with more detailed contours as desired. The system is made robust to partial or inaccurate sketches using a reduced-dimensionality model of pose space learnt from a labelled collection of photos. Throughout the composition process, interactive visual feedback is provided to guide the user. Finally, the user's partial or complete sketch, complemented with appearance requirements, is used to constrain the automatic synthesis of a novel, high-quality, realistic image

    Shazam For Bats: Internet of Things for Continuous Real-Time Biodiversity Monitoring

    Get PDF
    Biodiversity surveys are often required for development projects in cities that could affect protected species such as bats. Bats are important biodiversity indicators of the wider health of the environment and activity surveys of bat species are used to report on the performance of mitigation actions. Typically, sensors are used in the field to listen to the ultrasonic echolocation calls of bats or the audio data is recorded for post-processing to calculate the activity levels. Current methods rely on significant human input and therefore present an opportunity for continuous monitoring and in situ machine learning detection of bat calls in the field. Here, we show the results from a longitudinal study of 15 novel Internet connected bat sensors—Echo Boxes—in a large urban park. The study provided empirical evidence of how edge processing can reduce network traffic and storage demands by several orders of magnitude, making it possible to run continuous monitoring activities for many months including periods which traditionally would not be monitored. Our results demonstrate how the combination of artificial intelligence techniques and low-cost sensor networks can be used to create novel insights for ecologists and conservation decision-maker

    Quantitative analysis of morphology of porous silicon nanostructures formed by metal-assisted chemical etching

    Get PDF
    Morphology features of porous layers consisting of silicon nanowire arrays, which were grown by metal-assisted chemical etching, have been analyzed by means of digital processing of their scanning electron microscopy (SEM) images. Informational-entropic and Fourier analysis have been applied to quantitatively describe the degree of order and chaos in nanostructure distribution in the layers. Self-similarity of the layer morphology has been quantitatively described via its fractal dimensions. The applied approach allows us to distinguish morphological features of as-called "black" (more ordered) and "white" (less ordered) silicon layers characterized by minimal and maximal optical reflection, respectively.This work was partially supported by the Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan (Grant AP05132854, Grant AP05132738). G.K.M. and V.Yu.T. acknowledge the support of the Comprehensive Program of NRNU “MEPhI”
    corecore